{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# Firecrawl Tool Example\n",
    "\n",
    "This notebook demonstrates how to use the FirecrawlTool to scrape, crawl, and map websites.\n",
    "\n",
    "## Setup\n",
    "\n",
    "First, make sure you have the required dependencies:\n",
    "\n",
    "```bash\n",
    "pip install firecrawl-py\n",
    "```\n",
    "\n",
    "You'll also need a Firecrawl API key from [firecrawl.dev](https://firecrawl.dev/)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1",
   "metadata": {},
   "outputs": [],
   "source": [
    "from autogen.tools.experimental.firecrawl import FirecrawlTool\n",
    "\n",
    "# Set your Firecrawl API key\n",
    "# Option 1: Set as environment variable\n",
    "# os.environ[\"FIRECRAWL_API_KEY\"] = \"fc-your-api-key-here\"\n",
    "\n",
    "# Option 2: Pass directly to the tool\n",
    "firecrawl_api_key = \"fc-your-api-key-here\"  # Replace with your actual API key\n",
    "\n",
    "# Initialize the tool\n",
    "firecrawl_tool = FirecrawlTool(firecrawl_api_key=firecrawl_api_key)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2",
   "metadata": {},
   "source": [
    "## Example 1: Scraping a Single URL\n",
    "\n",
    "The default functionality of the FirecrawlTool is to scrape a single URL."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Scrape a single URL\n",
    "results = firecrawl_tool(url=\"https://firecrawl.dev\")\n",
    "\n",
    "print(f\"Number of results: {len(results)}\")\n",
    "if results:\n",
    "    result = results[0]\n",
    "    print(f\"Title: {result['title']}\")\n",
    "    print(f\"URL: {result['url']}\")\n",
    "    print(f\"Content preview: {result['content'][:200]}...\")\n",
    "    print(f\"Metadata: {result['metadata']}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4",
   "metadata": {},
   "source": [
    "## Example 2: Scraping with Options\n",
    "\n",
    "You can customize the scraping process with various options."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Scrape with custom options\n",
    "results = firecrawl_tool(\n",
    "    url=\"https://firecrawl.dev\",\n",
    "    formats=[\"markdown\", \"html\"],  # Get both markdown and HTML\n",
    "    include_tags=[\"h1\", \"h2\", \"p\"],  # Only include specific HTML tags\n",
    "    exclude_tags=[\"script\", \"style\"],  # Exclude these tags\n",
    "    wait_for=2000,  # Wait 2 seconds for page to load\n",
    "    timeout=10000,  # 10 second timeout\n",
    ")\n",
    "\n",
    "if results:\n",
    "    result = results[0]\n",
    "    print(f\"Title: {result['title']}\")\n",
    "    print(f\"Content preview: {result['content'][:300]}...\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6",
   "metadata": {},
   "source": [
    "## Example 3: Crawling a Website\n",
    "\n",
    "Use the crawl method to recursively crawl multiple pages of a website."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Crawl a website\n",
    "crawl_results = firecrawl_tool.crawl(\n",
    "    url=\"https://firecrawl.dev\",\n",
    "    limit=3,  # Crawl maximum 3 pages\n",
    "    formats=[\"markdown\"],\n",
    "    max_depth=2,  # Maximum depth of 2 levels\n",
    "    include_paths=[\"/docs/*\"],  # Only crawl documentation pages\n",
    ")\n",
    "\n",
    "print(f\"Number of crawled pages: {len(crawl_results)}\")\n",
    "for i, page in enumerate(crawl_results):\n",
    "    print(f\"\\nPage {i + 1}:\")\n",
    "    print(f\"  Title: {page['title']}\")\n",
    "    print(f\"  URL: {page['url']}\")\n",
    "    print(f\"  Content preview: {page['content'][:150]}...\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8",
   "metadata": {},
   "source": [
    "## Example 4: Mapping a Website\n",
    "\n",
    "Use the map method to discover URLs from a website without scraping content."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Map a website to get URLs\n",
    "map_results = firecrawl_tool.map(\n",
    "    url=\"https://firecrawl.dev\",\n",
    "    search=\"docs\",  # Search for URLs containing \"docs\"\n",
    "    include_subdomains=False,\n",
    "    limit=10,  # Get maximum 10 URLs\n",
    ")\n",
    "\n",
    "print(f\"Number of URLs found: {len(map_results)}\")\n",
    "for i, url_info in enumerate(map_results):\n",
    "    print(f\"  {i + 1}. {url_info['url']}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "10",
   "metadata": {},
   "source": [
    "## Example 5: Using with AG2 Agents\n",
    "\n",
    "The FirecrawlTool can be easily integrated with AG2 agents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11",
   "metadata": {},
   "outputs": [],
   "source": [
    "from autogen import AssistantAgent, UserProxyAgent\n",
    "\n",
    "# Create an assistant agent with the Firecrawl tool\n",
    "assistant = AssistantAgent(\n",
    "    name=\"web_scraper\",\n",
    "    system_message=\"You are a helpful web scraping assistant. Use the Firecrawl tool to scrape content from websites when asked.\",\n",
    "    llm_config={\n",
    "        \"model\": \"gpt-5-nano\",\n",
    "        \"api_key\": \"your-openai-api-key\",  # Replace with your OpenAI API key\n",
    "    },\n",
    ")\n",
    "\n",
    "# Register the Firecrawl tool with the assistant\n",
    "firecrawl_tool.register_for_llm(assistant)\n",
    "\n",
    "# Create a user proxy agent\n",
    "user = UserProxyAgent(\n",
    "    name=\"user\",\n",
    "    human_input_mode=\"NEVER\",\n",
    "    code_execution_config=False,\n",
    ")\n",
    "\n",
    "# Example conversation\n",
    "response = user.initiate_chat(\n",
    "    assistant,\n",
    "    message=\"Please scrape the content from https://firecrawl.dev and summarize what Firecrawl is.\",\n",
    "    max_turns=3,\n",
    ")\n",
    "\n",
    "print(\"Chat completed!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12",
   "metadata": {},
   "source": [
    "## Use Cases\n",
    "\n",
    "The FirecrawlTool is useful for:\n",
    "\n",
    "1. **Content Extraction**: Scrape clean, formatted content from web pages\n",
    "2. **Website Discovery**: Map websites to understand their structure\n",
    "3. **Documentation Crawling**: Crawl entire documentation sites\n",
    "4. **Data Collection**: Gather data from multiple pages automatically\n",
    "5. **Research**: Extract information from various web sources\n",
    "\n",
    "## Features\n",
    "\n",
    "- **Multiple Formats**: Get content in markdown, HTML, or other formats\n",
    "- **Flexible Filtering**: Include/exclude specific HTML tags\n",
    "- **Path Control**: Control which paths to crawl or exclude\n",
    "- **Rate Limiting**: Built-in rate limiting and timeout controls\n",
    "- **JavaScript Support**: Handles JavaScript-rendered pages\n",
    "- **Clean Output**: Returns clean, structured content"
   ]
  }
 ],
 "metadata": {
  "front_matter": {
   "description": "This notebook demonstrates how to use the FirecrawlTool to scrape, crawl, and map websites.",
   "tags": [
    "tool calling",
    "tools",
    "firecrawl"
   ]
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
